22 research outputs found

    Stable In-hand Manipulation with Finger Specific Multi-agent Shadow Reward

    Full text link
    Deep Reinforcement Learning has shown its capability to solve the high degrees of freedom in control and the complex interaction with the object in the multi-finger dexterous in-hand manipulation tasks. Current DRL approaches prefer sparse rewards to dense rewards for the ease of training but lack behavior constraints during the manipulation process, leading to aggressive and unstable policies that are insufficient for safety-critical in-hand manipulation tasks. Dense rewards can regulate the policy to learn stable manipulation behaviors with continuous reward constraints but are hard to empirically define and slow to converge optimally. This work proposes the Finger-specific Multi-agent Shadow Reward (FMSR) method to determine the stable manipulation constraints in the form of dense reward based on the state-action occupancy measure, a general utility of DRL that is approximated during the learning process. Information Sharing (IS) across neighboring agents enables consensus training to accelerate the convergence. The methods are evaluated in two in-hand manipulation tasks on the Shadow Hand. The results show FMSR+IS converges faster in training, achieving a higher task success rate and better manipulation stability than conventional dense reward. The comparison indicates FMSR+IS achieves a comparable success rate even with the behavior constraint but much better manipulation stability than the policy trained with a sparse reward

    Multi-Phase Multi-Objective Dexterous Manipulation with Adaptive Hierarchical Curriculum

    Full text link
    Dexterous manipulation tasks usually have multiple objectives, and the priorities of these objectives may vary at different phases of a manipulation task. Varying priority makes a robot hardly or even failed to learn an optimal policy with a deep reinforcement learning (DRL) method. To solve this problem, we develop a novel Adaptive Hierarchical Reward Mechanism (AHRM) to guide the DRL agent to learn manipulation tasks with multiple prioritized objectives. The AHRM can determine the objective priorities during the learning process and update the reward hierarchy to adapt to the changing objective priorities at different phases. The proposed method is validated in a multi-objective manipulation task with a JACO robot arm in which the robot needs to manipulate a target with obstacles surrounded. The simulation and physical experiment results show that the proposed method improved robot learning in task performance and learning efficiency.Comment: Accepted by the Journal of Intelligent & Robotic System

    Curriculum-based Sensing Reduction in Simulation to Real-World Transfer for In-hand Manipulation

    Full text link
    Simulation to Real-World Transfer allows affordable and fast training of learning-based robots for manipulation tasks using Deep Reinforcement Learning methods. Currently, Sim2Real uses Asymmetric Actor-Critic approaches to reduce the rich idealized features in simulation to the accessible ones in the real world. However, the feature reduction from the simulation to the real world is conducted through an empirically defined one-step curtail. Small feature reduction does not sufficiently remove the actor's features, which may still cause difficulty setting up the physical system, while large feature reduction may cause difficulty and inefficiency in training. To address this issue, we proposed Curriculum-based Sensing Reduction to enable the actor to start with the same rich feature space as the critic and then get rid of the hard-to-extract features step-by-step for higher training performance and better adaptation for real-world feature space. The reduced features are replaced with random signals from a Deep Random Generator to remove the dependency between the output and the removed features and avoid creating new dependencies. The methods are evaluated on the Allegro robot hand in a real-world in-hand manipulation task. The results show that our methods have faster training and higher task performance than baselines and can solve real-world tasks when selected tactile features are reduced

    A Multi-Agent Approach for Adaptive Finger Cooperation in Learning-based In-Hand Manipulation

    Full text link
    In-hand manipulation is challenging for a multi-finger robotic hand due to its high degrees of freedom and the complex interaction with the object. To enable in-hand manipulation, existing deep reinforcement learning based approaches mainly focus on training a single robot-structure-specific policy through the centralized learning mechanism, lacking adaptability to changes like robot malfunction. To solve this limitation, this work treats each finger as an individual agent and trains multiple agents to control their assigned fingers to complete the in-hand manipulation task cooperatively. We propose the Multi-Agent Global-Observation Critic and Local-Observation Actor (MAGCLA) method, where the critic can observe all agents' actions globally, and the actor only locally observes its neighbors' actions. Besides, conventional individual experience replay may cause unstable cooperation due to the asynchronous performance increment of each agent, which is critical for in-hand manipulation tasks. To solve this issue, we propose the Synchronized Hindsight Experience Replay (SHER) method to synchronize and efficiently reuse the replayed experience across all agents. The methods are evaluated in two in-hand manipulation tasks on the Shadow dexterous hand. The results show that SHER helps MAGCLA achieve comparable learning efficiency to a single policy, and the MAGCLA approach is more generalizable in different tasks. The trained policies have higher adaptability in the robot malfunction test compared to the baseline multi-agent and single-agent approaches.Comment: Submitted to ICRA 202

    Learn and Transfer Knowledge of Preferred Assistance Strategies in Semi-autonomous Telemanipulation

    Full text link
    Enabling robots to provide effective assistance yet still accommodating the operator's commands for telemanipulation of an object is very challenging because robot's assistive action is not always intuitive for human operators and human behaviors and preferences are sometimes ambiguous for the robot to interpret. Although various assistance approaches are being developed to improve the control quality from different optimization perspectives, the problem still remains in determining the appropriate approach that satisfies the fine motion constraints for the telemanipulation task and preference of the operator. To address these problems, we developed a novel preference-aware assistance knowledge learning approach. An assistance preference model learns what assistance is preferred by a human, and a stagewise model updating method ensures the learning stability while dealing with the ambiguity of human preference data. Such a preference-aware assistance knowledge enables a teleoperated robot hand to provide more active yet preferred assistance toward manipulation success. We also developed knowledge transfer methods to transfer the preference knowledge across different robot hand structures to avoid extensive robot-specific training. Experiments to telemanipulate a 3-finger hand and 2-finger hand, respectively, to use, move, and hand over a cup have been conducted. Results demonstrated that the methods enabled the robots to effectively learn the preference knowledge and allowed knowledge transfer between robots with less training effort

    Lightweight object detection algorithm based on YOLOv5 for unmanned surface vehicles

    Get PDF
    Visual detection technology is essential for an unmanned surface vehicle (USV) to perceive the surrounding environment; it can determine the spatial position and category of the object, which provides important environmental information for path planning and collision prevention of the USV. During a close-in reconnaissance mission, it is necessary for a USV to swiftly navigate in a complex maritime environment. Therefore, an object detection algorithm used in USVs should have high detection s peed and accuracy. In this paper, a YOLOv5 lightweight object detection algorithm using a Ghost module and Transformer is proposed for USVs. Firstly, in the backbone network, the original convolution operation in YOLOv5 is upgraded by convolution stacking with depth-wise convolution in the Ghost module. Secondly, to exalt feature extraction without deepening the network depth, we propose integrating the Transformer at the end of the backbone network and Feature Pyramid Network structure in the YOLOv5, which can improve the ability of feature expression. Lastly, the proposed algorithm and six other deep learning algorithms were tested on ship datasets. The results show that the average accuracy of the proposed algorithm is higher than that of the other six algorithms. In particular, in comparison with the original YOLOv5 model, the model size of the proposed algorithm is reduced to 12.24 M, the frames per second reached 138, the detection accuracy was improved by 1.3%, and the mean of average precision (0.5) reached 96.6% (from 95.3%). In the verification experiment, the proposed algorithm was tested on the ship video collected by the “JiuHang 750” USV under different marine environments. The test results show that the proposed algorithm has a significantly improved detection accuracy compared with other lightweight detection algorithms

    Identification of Magnetic Interactions and High-field Quantum Spin Liquid in α\alpha-RuCl3_3

    Full text link
    The frustrated magnet α\alpha-RuCl3_3 constitutes a fascinating quantum material platform that harbors the intriguing Kitaev physics. However, a consensus on its intricate spin interactions and field-induced quantum phases has not been reached yet. Here we exploit multiple state-of-the-art many-body methods and determine the microscopic spin model that quantitatively explains major observations in α\alpha-RuCl3_3, including the zigzag order, double-peak specific heat, magnetic anisotropy, and the characteristic M-star dynamical spin structure, etc. According to our model simulations, the in-plane field drives the system into the polarized phase at about 7 T and a thermal fractionalization occurs at finite temperature, reconciling observations in different experiments. Under out-of-plane fields, the zigzag order is suppressed at 35 T, above which, and below a polarization field of 100 T level, there emerges a field-induced quantum spin liquid. The fractional entropy and algebraic low-temperature specific heat unveil the nature of a gapless spin liquid, which can be explored in high-field measurements on α\alpha-RuCl3_3.Comment: To appear in Nature Communications (12 pages, 6 figures, and 5 Supplementary Notes

    Multi-cell battery modeling

    No full text
    The battery is commonly used in various electronics devices ranging from laptop computers to large-scale energy storage. Due to the limited battery energy capacity, battery management is very critical to system performance. In order to maximize the battery’s performance, it is important to derive a mathematical battery model to quantitatively characterize major battery nonlinear capacity effects such as rate-dependent capacity, recovery effect, temperature effect, and capacity fading. Aslo it is important to characterize nonlinear circuit characteristics such as open-circuit voltage, internal resistant, and output voltage. Such a model will enable us to gain a thorough understanding of battery behaviors under various operation conditions. It will be useful for circuit simulation, multi-cell battery design and analysis, battery maintenance, and battery performance prediction and optimization. In literature, various battery models have been developed, but there are no models, which consider all major nonlinearities. Also, these battery models don’t consider cell connection and cell-to-cell variation. In this research, we will focus on addressing the fundamental problem by developing an accurate battery model based on electrical circuit analysis to consider cell connection, cell-to-cell variation, and all battery nonlinear effects. The proposed model will be validated with experimental data. (1) A comprehensive, circuit-based single-cell battery model is designed to accurately capture the performance of the battery under both constant and variable load profiles. This model considers nonlinear capacity effects and nonlinear circuit characteristics of battery. It is a comprehensive battery model. (2) A comprehensive, circuit based multi-cell battery model is developed. This model accurately estimates performance with consideration of all nonlinearities found in a battery, cell connection, and cell-to-cell variation. A novel algorithm is proposed to accurately derive the current distribution of cells in the cell string in parallel connection. The computational complexity of this algorithm increases linearly with the number of cells. This makes the multi-cell battery model ready to model a large-scale battery system. (3) A cell interaction model is studied to evaluate the effectiveness of parallel connection for cells with cell-to-cell SOC variation. Based on the proposed cell interaction model, a theoretical bound of cell-to-cell variation for a given number of cells and a required load is derived. The maximum number of battery cells for a given cell-to-cell variation distribution and a required load is also obtained. Both derived bounds enable optimization of battery performance down to the cell level

    Charge Management Optimization for Future TOU Rates

    No full text
    The effectiveness of future time of use (TOU) rates to enable managed charging depends on the vehicle’s flexibility and the benefits to owners. This paper adopts opportunity, delayed, and smart charging methods to quantify these impacts, flexibilities, and benefits. Simulation results show that delayed and smart charging methods can shift most charging events to lower TOU rate periods without compromising the charged energy and individual driver mobility needs

    Learn Task First or Learn Human Partner First: A Hierarchical Task Decomposition Method for Human-Robot Cooperation

    Full text link
    Applying Deep Reinforcement Learning (DRL) to Human-Robot Cooperation (HRC) in dynamic control problems is promising yet challenging as the robot needs to learn the dynamics of the controlled system and dynamics of the human partner. In existing research, the robot powered by DRL adopts coupled observation of the environment and the human partner to learn both dynamics simultaneously. However, such a learning strategy is limited in terms of learning efficiency and team performance. This work proposes a novel task decomposition method with a hierarchical reward mechanism that enables the robot to learn the hierarchical dynamic control task separately from learning the human partner's behavior. The method is validated with a hierarchical control task in a simulated environment with human subject experiments. Our method also provides insight into the design of the learning strategy for HRC. The results show that the robot should learn the task first to achieve higher team performance and learn the human first to achieve higher learning efficiency.Comment: Accepted by SMC202
    corecore